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Abstract 

'.vj  propose  a  trodulog/  ter  fitting  theoretical  models  to  data.  The  dependent 
variable  (or  response)  and  the  model  are  transformed  in  the  same  way.  Two  types  of 
transformations,  power  transformation  and  weighting,  are  user!  together  to  remove 
skewness  and  to  induce  constant  variance.  Our  method  is  applied  to  the  stock- 
recruitment  data  of  four  fish  stocks.  Also  discussed  are  estimates  of  the 
conditional  mean  and  the  conditional  quantiles  of  the  original  response. 

1.  Introduction 


In  regression  analysis  one  seeks  to  establish  a  relationship  between  a  response 
y  and  independent  variables  x  =  (x^,. . . ,x^) ' .  Often  the  physical  or  biological 
system  generating  the  data  suggest  that  in  the  absense  of  random  error  y  =  f(x,0) 
where  f  is  a  known  function  and  9  is  an  unknown  parameter.  In  reality  random 
variability,  modeling  errors,  and  measurement  errors  will  cause  y  to  deviate  fran 
f (x,0). 

Usually  0  is  estimated  by  the  (possibly  nonlinear)  leas t -squares  estimator  0 
which  minimizes 

A 

^  (Yfc  -  f(xt,0))2 
t=l 

where  yfc  and  x^  =  (x^t, . . . (X^) '  are  the  t-th  observations  of  the  response  and  the 
independent  variables,  t=l,...,N.  The  method  of  least  squares  is  not  uniquely 
determined  in  the  following  sense.  If  h(y)  is  a  monotonic  function  then  y  =  f(x,0> 
implies  that  h(y)  =  h(f(x,0)).  In  the  absense  of  randcm  error,  there  is  an  infinity 
of  possible  models,  one  for  each  h.  Taking  h(y)  to  be  the  new  dependent  variable 
and  h(f(x,0))  to  be  the  new  regression  model,  the  least-squares  estimate  0(h), 
depending  on  h,  tain Lr.ir.es 
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This  paper  addresses  the  question  "hew  should  h  be  chosen?"  In  the  past  h  was 
often  chosen  so  that  the  model  was  linear  in  the  parameters,  but  with  the  wide 
availability  of  nonlinear- least  squares  software  linearization  is  no  longer 
necessary.  Instead  h  should  be  selected  so  that  6(h)  has  high  statistical 
efficiency  in  sane  sense,  say  lew  asymptotic  mean  square  error.  For  nonlinear 
models  finite-sample  mean  square  errors  are  seldom  known  and  large-sample,  or 
asymptotic,  mean  square  errors  must  be  used. 

Least-squares  estimation  is  asymptotically  efficient  when  the  errors  are 
additive,  nonrally  distributed,  and  hcmoscedastic,  that  is,  if 


yt  =  +  et  '  (1> 

2  2 

where  €.,...,6^  are  independent  N(0,o-  )  random  variables  for  sane  o-  >0.  If  (i) 
fails  to  hold  for  the  original  response  and  model,  it  may  hold  after  these  have  been 
transformed.  For  example,  if  the  errors  are  multiplicative  and  lognormal,  then 

log(yi)  =  logffOc^.,©))  + 

so  that  h(y)  =  log(y)  is  the  appropriate  transformation. 

In  general,  it  is  impossible  to  know  a  priori  hew  the  random  errors  affect  y. 

It  is,  however,  often  reasonable  to  assume  as  an  approximation  that  for  an  unknown  h 

h(yt)  =  h<f (x^,©) )  +  €t  and  6t  _  N(0,o-2), 

and  then  to  estimate  h  from  the  data. 

Carroll  and  Ruppert  (1984)  introduced  a  methodology  where  h  is  assumed  to 
belong  to  a  parametric  class  such  as  the  class  of  power  functions.  Specifically, 
h(y)  =  h(y,\)  where  h(y,\)  is  a  known  parametric  family  and  \  is  an  unknown 
parameter.  Then  0,  o-,  and  X  are  estimated  simultaneously,  for  example  by  maximum 
likelihood. 

Box  and  Ccx  (1964)  used  a  modified  power  transformation  family 

y<X>  =  (yX-l>/\  X/0 

*  log(y)  \=0» 

which  includes  the  log  transformation  in  a  natural,  i.e. ,  continuous,  fashion.  It 
should  be  mentioned  that  Bex  and  Cox  were  concerned  with  a  quite  different 
transformation  methodology.  They  transform  only  the  response,  not  the  model;  see 
Carroll  and  Ruppert  (1984). 


1} 


h  was 


'  :  (1) 


Another  transformation  family  which,  as  we  will  see,  is  useful  in  stock- 
recruitment  analysis  is  division  by  (ut)oc  where  ufc  is  a  known  constant  and  a  is  an 
unknown  parameter.  Since  ut  is  a  constant  which  does  not  depend  on  yfc,  a 
transformation  of  this  kind  has  no  effect  on  skewness  but  it  induces 
homoscedasticity  if  (ut)a  is  the  standard  deviation  of  yfc.  The  constant  ufc  can  be 
one  of  the  independent  variables,  a  variable  not  depending  on  or  y^,  or  a 
function  of  such  variables.  Division  by  (u  )a  will  be  called  a  "pcwer  weighting 
transformation".  Note  that  ujf  denotes  ordinary,  not  Bex -Cox,  pewer  transformation 
of  u  so  that  u^  =  1  when  or  =  0. 

This  paper  is  restricted  to  a  model  combining  a  Bex -Cox  power  transformation 
with  a  power  weighting  transformation: 


ytX>  7  ut  =  f(X)(V£,/ut  +  6i* 


Hoover,  the  methodology  that  is  developed  here  can  be  applied  to  other 
transformation  families. 

According  by  model  (2),  y*^1  is  symmetrically  distributed  about  f^(x*.,9). 

(X)  ) 

Therefore,  f  <*tf9)  9^ves  the  conditional  (given  )  mean  and  median  of  y  *  . 

This  implies  further  that  the  untransformed  model  f  (xt,9)  gives  the  conditional 

median  of  the  un transformed  response  yfc.  Hcwever,  the  conditional  mean  of  yfc  is  not 

f<xv,0)  except  if  \  =  1.  The  conditional  mean  is  discussed  below.  We  see  then 

that  f  <xt,G)  has  two  interpretations;  it  gives  the  value  of  y  if  there  is  no  errur, 

and  otherwise  it  gives  the  conditional  median  of  yfc. 

In  our  examples,  =  x^t  is  the  size  of  a  spawning  stock  Sfc  and  the  response 
yt  is  the  total  return  Rt.  Carmonly  used  model  functions  f  (x,0d  include  the  Ricker 
(1954)  model 


Rfc  =  exp(91  +  ©2  Sfc)  , 


and  the  Beverton-Holt  (1957)  model 


Rt  =  V(9l  +  Vst)- 


We  will  let  ufc  =  Xfc.  Then  (3)  can  be  transformed  to 


Rt/Sfc  =  exp(81  +  ©2  S^) 


by  letting  \=1  and  a=l.  By  using  \=0  and  oc=C  on  (5),  one  obtains 


1°9<W  -  »i  +  »2  Sf 


Wvi'R'ovSvivOriv 
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Equation  (6)  is  often  favored  since  it  is  linear  in  the  parameters. 
Equation  (4)  can  be  linearized  to 


1/3  .  6.  +  0.  (1/S  ) 

(7) 

VRt  *  *2  *  ei  St 

(8) 

corresponding  to  \=-l  and  a=0  and  \-l  and  a=-l,  respectively. 

By  estimating  \  and  cc  we  can  tell  when  these  linearizing  transformations  are 
appropriate  for  a  given  set  of  data,  and  we  can  find  more  suitable  transformations 
when  the  linearization  is  not  appropriate. 

Although  we  advocate  transforming  the  response  to  achieve  an  error  structure 
where  least-squares  is  efficient,  often  one  must  estirrate  characteristics  (such  as 
the  conditional  near.)  of  the  original  response.  In  section  5  estimates  of  the 
conditional  mean  and  tne  conditional  quantiles  cf  y  given  are  presented. 


2.  Power  transformations ,  skewness,  and  heteroscedasticity 

In  this  section,  we  discuss  hew  power  transformations  affect,  and  in  particular 

remove,  skewness  and  heteroscedasticity.  Suppose  that  the  randan  variable  y  has 

2  z 
near.  and  variance  o-^  =  g(mt);  the  same  function  g  applies  for  each  t.  If  yt  is 

transformed  to  h(yt> ,  then  by  a  Taylor  approximation  as  in  Bartlett  (1947),  the 

variance  of  h(yt)  is 

E{h(yt)  -  E  h(yt)}2  = 


E(h(yt )  -  h(mt)}2 
=  <h(mt))2  E{yt  -  mt}2 
*  (h(mt))2  g(mfc) 

•  • 
where  h(y)  =  d/dt  h(y).  The  variance  of  h(y  )  is  approximately  constant  if  h(y)  is 
_i  z 

proportional  to  g  (y). 

In  many  cases  g  is  a  pewer  function.  For  example,  g(m)  =  m  for  Poisson- 

2 

distributed  data  and  g(m)  is  proportion  to  m  if  the  coefficient  of  variation  (CV) 

2 

is  constant.  Also,  g(m)  =  m  for  exponentially  distributed  data.  When 
g(m)  <x  m2^  ^  then  h(m)  a  g~^  (m)  if  h(y)  =  y^.  In  the  case  \  =  0  (constant 
CV;,  the  log  transformation  stabilizes  the  variance,  and  for  Poisson  data  \  =  l/2j 
the  square-root  transformation  is  indicated. 

The  effect  of  h  on  skewness  is  also  revealed  by  h.  Suppose  that  y  is 
positively  skewed.  The  extended  right  tail  in  the  distribution  of  y  is  reduced 
through  transformation  to  h(y)  if  large  values  of  y  are  " compressed  together"  more 


( 
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(7) 

(8) 

5  are 
itions 


rticular 
K  has 
is 


j  )  is 


3n  (CV) 


r"  more 


than  snail  values;  more  precisely,  if  for  all  /\  >  0,  |h(y^+/\)  -  h(y^)|  <  |h(y2+/V 
-  h { ) |  whenever  y^  >  y2*  Such  compression  occurs  if  h(y)  is  a  decreasing 
function,  in  which  case  h  is  called  concave. 

•  _i  \ 

When  h(y)  is  a  Bax-Ccx  power  transformation  then  h(y)  =  (d/dy)  X  <y  -1)  when 
X/0  or  h(y)  =  (d/dv)  log  y  when  X  =  0.  In  either  case,  h(y)  =  y*  ,  and 
consequently  h  is  concave  if  \  <  1.  Bex -Cox  transformations  reduce  right-skewne?c 
when  \  <  1  and  the  amount  of  reduction  increases  as  X  decreases. 

Left  skewness  if  reduced  by  transformations  that  are  convex,  that  is,  which 
have  an  increasing  derivative.  If  X  >  1<  then  y^*  is  convex. 


3 .  Simultaneous  power  transformation  and  weighting  by  maximum  likelihood. 

Although  power  transformations  can  reduce  both  skewness  and  heteroscedasticity, 
the  value  of  X  inducing  normality,  or  at  least  a  reasonably  symmetric  distribution, 
need  not  be  the  same  X  which  transforms  to  constant  variance.  A  more  flexible 
approach  to  modeling  carbines  a  3cx-Ccx  power  transformation  to  normality  with  a 
power  weighting  transformation  to  hcnoscedasticity  as  in  equation  (2).  In  this 
section  we  discuss  the  estimation  of  X,  at,  8,  and  o-  by  maximum  likelihood  and  the 
construction  of  a  confidence  region  for  X  and  cx  by  likelihood-ratio  testing. 

To  find  the  likelihood,  we  first  note  that  the  density  of  is 

(2  |T  o -V*  exp(--S2/ (2  o-2)) 

and  the  Jacobian  of  €t  -»  is  yX~Vu®.  Therefore,  the  conditional  density  of 

yt  given  xfc  and  ut  is 

f<yt|xt,ut,9,o-,ac,X)  = 

(27J  o-2)_i  (i,t"1/ut)  e*P<“ty^>  "  f(X>(xtri)]2/(2  o-2  u2<X>). 

If  xt  and  ut  are  fixed  constants,  not  depending  on  y^, . . . ,yfc_^,  then  the  log- 
likelihood  of  y^,...,yN  is 

L(X,a,6,o-2)  =  -N/2  log(27T  °-2) 

JL  JL 

+  (X-l)  ^  log(yt)  -  <x  ^  log  (ufc) 
t=l  t=l 

-  (1/2  o-2)  ^  {y£X)  -  f(X)(Xt'£>>2/uta  * 
t=l 

If  xfc  and  ut  depend  on  previous  values  of  y,  then  L(\,ac,0,o-2)  is  still  the 


•  ■_*  *  >  "  »  '  m  ' 

cc  -  .A'! 


!  •  .V  *!: 


» 
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conditional  log-likelihood  of  y1,...,yu  given  x^x^x^, ...  and  u^u^u^,...  ,  and 
maximization  of  L(X,a,S,o-2)  should  still  produce  good  estimates  of  \,<x,9,o -2.  The 
distinction  between  conditional  and  unconditional  estimates  from  time  series  data  is 
discussed  in  Box  and  Jenkins  (1976,  Chapter  7).  The  dependence  of  xfc  and  ut  on  past 
y  may,  however,  produce  biases.  Walters  (1985)  found  sizeable  biases  in  a  Monte 
Carlo  study  with  small  sample  sizes,  N  =  10.  His  study  assumed  that  X  was  known 

and  equal  to  0.  His  formulas  show  that  the  bias  should  decrease  as  N  increases. 

.  .  2 

Following  Eox  and  Cox  (1964),  we  maximize  L(X,oc,S,o-  )  in  two  stayes.  First, 
for  fixed  X  and  a  the  MLE  of  0  is  the  nonlinear,  weighted  least-squares  estimator 
0(X,a)  which  minimizes 


SS(  X,oc,0)  =  ^  (y'X> 

t^i 


f !)J  13^,8)  }2/\3** 


ana  the  MLE  of  o-2  is  o-2(X,a)  =  N  1  SS(X,ac,  0(X,oO).  Define 


L  ( X#cx)  =  L(X,a,  0(X,a),  6-(.\,<x.)) 

ITclX 

to  be  the  log-likelihood  traximized  with  respect  to  0  and  o-  for  fixed  X  and  a.  An 

approximate  MLE  is  fourid  by  computing  L  (X-O'!  on  a  grid;  in  section  4  wc  use 

m£ix 

oc  =  — 1  (.25)1  and  X  =  — 1  (.25)1.  If  the  exact  MLE  is  needed  then  L  (X»oc)  can  be 

JTkxX 

maximized  by  a  numerical  optimization  technique  using  the  approximate  MLE  as  an 
initial  value.  However,  an  exact  MLE  is  probably  unnecessary  for  most  applications. 

The  function  can  be  used  to  test  hypotheses  and  to  construct 

confidence  regions  for  (X,oO.  For  a  given  null  value  (Xg,  <Xg),  one  can  test  H^: 
(X,oc)  =  (Xg»  cxQ)  against  <X,  a)  ^  (Xgr  «g)  by  the  likelihood  ratio  test.  The 
log-likelihood  ratio  is 

LR(X0>  «g)  =  Ljnax'X,  «)  -  b^^g,  «g>- 

Here  (X,  <x)  is  the  MLE  or  approximate  MLE.  Hg  is  rejected  at  level  6  if 
2  LR(Xg#  «g)  exceeds  X^d,  l-€),  the  100(1-6)  percentile  of  the  chi-square 
distribution  with  one  degree  of  freedom  (Rao,  1973,  section  6e).  This  test  can  be 
applied  to  each  value  (Xg/  «g)  on  the  grid,  and  a  100(1-6)  percent  confidence 
region  for  (X,  oc)  consists  of  all  null  values  which  are  not  rejected. 


Examples 


These  data  and  the  population  B  data  discussed  later  wsre  obtained  through 
Professor  Carl  Walters  (pers.  cairn. ).  Permission  to  identify  the  stocks  has  been 
refused  by  the  original  source.  There  are  twenty-eight  years  of  data.  The 
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ita  is 
1  past 


variables,  Rt  =  total  return  and  Sfc  =  spawner  escapement,  are  plotted  in  figure  1. 
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Spawners  (numbers  of  fish) 

Fig.  1:  Population  A.  Actual  recruitment  from  1940  to  1967, 
estimated  median  recruitment  based  on  the  Ricker  and 
Beverton-Holt  (BH)  models,  and  estimated  mean  recruit¬ 
ment  based  on  the  Ricker  model. 


We  first  fit  the  (transformed)  Ricker  model: 


R^/S*  =  (St  exp^  +  ©2  St))(X,/S*  +  6t  . 


The  approximate  MLE  (on  the  grid  X  =  — 1  ( . 25 )  1  and  <x  =  — 1  ( . 25 )  1 )  is  (X/  *)  = 

(.25,0)  and  the  maximized  log-likelihood  is  -338.3106.  With  i\,<x)  equal  to  the 
MLE,  model  (9)  becomes 

5^  =  exp((61  +  e2  St>/4>  +  et  . 

Confidence  regions  foe  (X,  oc)  ate  given  in  Table  1.  The  95%  univariate  confidence 
regions  for  X  and  a  are  {0,.25,.5)  and  {-.25,0, .25, .5, .75},  respectively. 


Lambda 


Alpha 


was  fit.  The  MLE  is  (X-^  a)  =  (.2b, .25)  and  the  maximum  log- likelihood  is 
-337.0919.  The  difference  between  the  naximum  log- likelihood  for  the  Beverton-Holt 
and  Ricker  models  is  only  1.2187  which  indicates  that  both  models  fit  almost  as 
well,  though  by  this  criterion  the  Beverton-Holt  model  does  provide  a  slightly 
better  fit. 

The  estimated  median  return,  calculated  as  described  in  section  5,  is  plotted 
in  figure  1  for  both  the  Ricker  and  Beverton-Holt  models,  as  is  the  "smearing 
estimate  (section  5)  of  the  mean  assuming  the  Ricker  model. 

A  researcher  with  only  linear  regression  software  might  be  tempted  to  linearize 
both  the  Ricker  and  Beverton-Holt  models  and  then  to  compare  them  on  their 
linearizing  scales.  When  the  linearized  Ricker  model 

iog(Rt/st)  =  eL  +  e2  st 

•  2 

is,  fit  R  =  0.207  and  the  F-value  for  testing  overall  significance  of  the  model  is 
6.79  (p  =  0.015).  If  the  linearized  Beverton-Holt  model 

1/Rt  =  ex  +  02/st 

2 

is  fit  then  R  =  0.000549,  F  =  0.01,  p  =  .9058,  and  if  the  alternative  linearized 
raodel 


2 

is  fit  then  R  =  0.110,  F  =  0.29,  and  d  =  .5950.  A  researcher  ccrnparing  these 
models  only  on  their  linearizing  scales  might  easily  be  tempted  into  concluding  that 
the  Ricker  model  provides  a  far  better  fit.  On  the  contrary,  we  believe  that  the 
Beverton-Holt  model  is  very  slightly  better  fitting,  but  the  log  transformation  is 
vastly  superior  to  the  inverse  transformation. 

It  is  interesting  to  see  how  well  the  MLE  transformation  achieves  both 
hcmoscedasticity  and  near  normality.  In  table  2  the  skewness  and  kurtosis  axe  given 
for  the  three  transformed  Ricker  models: 

(I)  R/S  =  exp(ei  +  62  S)  (\=1,  <x=l) 

(II)  log'R)  =  loc(S)  +  9^  +  ©2  S  (\=0,  a=0 ) 

(III)  !hr=  j/s'exp{(91  +  02  51/4)  ( 25 ,  a=0i, 

that  is,  no  power  transformation,  log  trar.sf on.  at ior.  end  the  MLE,  respectively. 


.er.bda 

Alpha 

Skewness 

Correlation 

Kurtosis 

Spearman 

] 

l 

.99 

2.69 

.39 

0 

0 

-1.04 

1.22 

.12 

.25 

0 

-.27 

-.23 

.14 

Table  2:  Population  A.  Skewness  and  kurtosis  of  residuals  and  Spearman  rank 
correlation  between  the  absolute  residuals  and  the  predicted  values.  The  Ricker 
model  was  used. 

For  model  (III)  both  skewness  and  kurtosis  are  closest  to  0,  their  theoretical  value 
under  normality.  Also  in  table  2  are  the  Spearman  rank  correlations  between  the 
absolute  residuals  and  the  fitted  values.  Since  the  fitted  values  are  an  increasing 
function  of  S,  the  rank  correlation  is  unchanged  if  the  fitted  values  are  replaced 
by  S.  Clearly,  \=0  and  \=. 25  both  transform  to  near  homos cedasticity  where  there 
is  little  or  no  relationship  between  the  mean  and  variance  of  R.  However,  the  MLE 
\=.25  is  preferable  to  \=0  since  \=0  "overtransforms”  to  negative  skewness. 

Normality  and  hcmoscedasticity  of  the  residuals  can  be  checked  graphically  by  a 
normal  plot  of  the  residuals  and  a  plot  of  the  residuals  against  the  fitted  values. 
The  graphical  analysis  of  residuals  should  be  done  routinely  for  transformation 
models  as  for  ordinary  regression  models.  Draper  and  Smith  (i?(!i)  provide  an 
excellent  account  of  graphical  residual  analysis.  We  plotted  the  residuals  frem  the 
MLE  and  found  no  evidence  of  non-normality  or  heter os cedasticity.  A  normal 


probability  plot  of  the  residuals  is  roughly  linear,  though  the  very  slight  negative 
skewness  (table  2)  is  evident.  For  these  data,  a  plot  of  the  residuals  versus  the 
predicted  values  is  not  easy  to  interpret;  the  Ricker  and  Beverton-Holt  curves  are 
nearly  constant  over  the  observed  range  of  S  so  the  predicted  values  are  quite 
similar  except  for  those  corresponding  to  the  smallest  values  of  S. 

4.2  Skeena  River  sockeye  salmon 

For  this  stock,  effective  escapement  (S)  and  total  return  (R)  are  given  for 
yeurs  1940  to  1967  in  Ricker  and  Smith  (1975).  These  data  are  plotted  in  figure  2 
along  -with  the  estimated  medians  (section  5)  for  the  Ricker  and  Beverton-Holt 
models.  Compared  with  the  Population  A  stock,  the  Skeena  River  data  show  less 
positive  skewness  but  more  heteroscedasticity.  Figure  2  shows  several  particularly 
low  values  of  R  associated  with  high  'values  of  S  but  only  one  particularly  high 
value.  This  may  indicate  over compensation,  or  it  may  simply  be  dee  to  the  high 
variability  in  R  'when  5  is  large. 


Spavners  (thousands  of  fish) 

Fig.  2:  Skeena  River  sockeye.  Actual  recruitment  from 
1940  to  1967  and  estimated  median  recruitment 
based  on  Beverton-Holt  and  Ricker  models. 

The  95%  confidence  regions  and  the  MLE  of  (\,  at)  are  the  same  for  the  Ricker 

and  Beverton-Holt  models  and  they  are  given  in  table  3.  The  maximum  log- likelihoods 

for  the  Ricker  and  for  the  Beverton-Holt  models  differ  by  only  0.164,  so  there  is 

little  to  suggest  one  model  over  the  other.  In  particular,  the  data  provide  no 
strong  evidence  of  over compensation . 
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Lambda  Alpha 


0 

-.25 

* 

0 

* 

.25 

.5 

.75  1 

.25 

★ 

* 

* 

★ 

.5 

★ 

* 

* 

* 

.75 

★ 

* 

★  * 

*  * 

*  *  -k  * 


*  =  951  confidence  region. 

**  =  MLE 

Table  2-’  Skeena  River  sockeye  salmon  stock.  95%  confidence  region  and  maximum 
likelihood  estimate  assuming  either  the  Ricker  or  Beverton-Holt  model. 

Compared  with  the  Population  A  data,  the  MLE  here  results  n  loss 
transformation  (X  =  .75  instead  of  X  =  .25  as  before)  but  more  weighting  ;a  =  .5 
instead  of  a  =  0  or  .25).  This  is  consistent  with  the  observation  that  the 
•on transformed  Skeena  data  exhibit  little  skewness  but  considerable 
heteroscedasticity. 


4.3  Pacific  Cod,  Necate  Strait  (Walters,  et  al. ,  1982) 

The  twenty-one  observations,  1959  to  1979,  on  this  stock  are  plotted  in  figure 
3  along  with  predicted  medians  for  two  models. 

Recruitment  is  virtually  independent  of  spawning  stock  except  that  the  two 
unusually  large  recruitments  occur  when  S  is  rather  snail.  For  this  reason  the 
Beverton-Holt  model  fits  poorly.  For  this  stock  we  will  consider  the  Ricker  model 
and  the  power  model 


Rt  =  61  St 


With  Box-Cox  and  weighting  transformations  the  pcwer  model  becomes 


r£x>/s®  =  (e:  St2)(X)  /  S*  +  efc  . 


Confidence  regions  for  (\,  ac)  are  given  in  table  4.  The  maximum  log-likelihood  is 
-34.9690  and  -37.2541  for  the  power  and  Ricker  models,  respectively,  and  the  large 
difference  (2.2851)  indicates  lack-of-fit  for  the  Ricker  model. 
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-  ’  ~  - 


0- 


Spawners  (millions  of  fish) 

Hecate  Strait  Pacific  Cod.  Actual  ricrui'nent 

from  19S9  to  1979  and  estimated  median  recruit¬ 
ment  based  on  the  Power  and  Ricker  models. 


■ig- 


Lambda 


Alpha 


-1  -.75 

-.5 

-.25 

0 

.25 

.5 

.75 

1 

-1 

* 

* 

* 

* 

-.75 

* 

* 

* 

* 

* 

-.5 

* 

* 

* 

* 

* 

* 

-.25 

* 

* 

* 

* 

* 

★ 

* 

0 

* 

* 

★ 

* 

** 

* 

* 

.25 

* 

* 

** 

** 

* 

* 

* 

.5 

* 

* 

* 

* 

* 

* 

* 

.75 

* 

* 

* 

* 

* 

* 

1 

* 

* 

* 

* 

*  =  95%  Confidence  region. 

**  -  MLE  or  within  .1  of  maximizing  the  log-likelihood. 

Table  £.  Hecate  Strait  Pacific  cod  stock.  95%  confidence  region  and  maximum 
likelihood  estimate  assuming  the  power  model. 
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We  can  test  the  Ricker  model  as  follows.  The  general  model 

Rt  =  StJ  exp^  +  02  St) 

includes  the  Ricker  model  (0^  =  1)  and  the  power  model  (0^  =  0)  as  special  cases. 

One  tests  the  Ricker  model  by  testing  H^:  0^  =  1  against  0^  /  1.  The  naximum 
log-likelihood  is  -34.9022  and  the  log-likelihood  maximized  subject  to  the 
constraint  9,  =  1  is  -37.2541  (see  above).  Twice  the  difference  is  4.704  which 
exceeds  X  (1,  .95)  =  3.84.  The  Ricker  model  is  reject  at  level  .05.  By  the  same 
reasoning  the  power  model  is  accepted  since  34.9690  -  34.9022  =  .0668  is  very  small. 

The  95%  confidence  region  for  (\,  at)  assuming  the  power  model  is  quite  large 
because  (a)  there  are  only  21  data  points  and,  more  importantly,  (b)  the  variability 
in  both  spawning  stock  and  return  is  snail  canpared  to  the  previously  examined 
stocks.  The  confidence  region  for  the  Ricker  model  is  even  larger,  but  this  is  to 
be  expected  considering  the  lack  of  fit. 

For  the  power  model,  the  MLE  is  (\,  ex)  =  (0,  .5),  but  the  log- likelihood  at 
X  =  .25  and  at  =  .25  or  .5  is  within  0.1  of  the  maximum. 

4.4  Population  B  (from  Professor  Carl  Walters, per s.  canm. ) 

For  this  stock,  R  and  vary  over  a  much  wider  range  than  for  the  three 
preceding  stocks,  and  it  seems  preferable  to  use  the  return  tc  spawner  ratio,  Rt/St# 
as  the  response.  A  plot  of  log  <Rt/st>  against  is  rather  linear,  but  most  of  the 
data  are  bunched  together  in  the  range  0  <  St  <  300,000  while  the  remainder  are 
scattered  over  300,000  <  Sfc  <  3,300,000,  so  in  figure  4  log  (R/S)  is  plotted  against 
log  (S). 

The  only  model  studied  here  is  the  transformed  Ricker  model 

(Rt/St)(X)  /  S*  -  {exp(01  +  e2  St)}<)0  /  S*  +  et  . 

Also  the  grid  <x  =  -1(1/411  is  replaced  by  <x  =  -1/4(1/16)1/4  because  values  of  at 
far  from  0  fit  poorly  and  lead  to  convergence  problems  when  0^  and  @2  are  computed. 
The  95%  confidence  region  for  (\,  at)  and  the  MLE  are  found  in  table  5.  Clearly, 
the  confidence  region  is  quite  small.  Because  Rfc  and  vary  over  extremely  wide 
ranges,  X  and  at  are  well  determined  by  the  data. 

5.  Estimating  the  Conditional  Distribution  of  y 
We  have  seen  how  to  utilize  the  model 

yiX>  /  u?  -  fU)<2£i'  /  u*  +  6. 

to  efficiently  estimate  9.  The  model  expresses  a  transformed  response  as  a  function 
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of  6,  x^  and  the  normally  (or  approximately  normally)  distributed  6^.  Typically 
interest  centers  on  the  untransformed  response  yfc.  In  this  section  it  is  shown  how 
to  estimate  the  conditional  (given  x^)  mean  of  y^  as  well  as  conditional  quantiles 
such  as  the  median. 


5  10  15 

Log  (Spawners) 

Fig.  4:  Poriulation  B.  Estimated  median  production 
ratio  based  on  the  Ricker  model .  Recruits 
and  spawners  are  in  numbers  of  fish.  The 
production  ratio  and  spawners  are  expressed 
in  logarithms. 
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Lambda  Alpha 

-4/16  -3/16  -2/16  -1/16  0  1/16  2/16  3/16  4/16 

-.25  *  * 


0 


★ 


irk  k  k 


.25 


*  * 


*  * 


*  -  in  95%  confidence  regions 

**  =  MLE 

Table  5.  Population  3.  95%  confidence  region  assuming  the  Ricker  model  R./S  = 

exp(0^  +  02  S) . 

Now  cannot  be  exactly  normally  distributed  in  most  cases,  for  example  when 
y^  and  f(x,  0)  are  non-negative  and  \/0.  It  is  better  to  suppose  that  6^  is  nearly 
norrral  but  with  a  bounded  range.  Fortunately,  for  present  purposes  we  need  not  make 
any  assumption  about  the  {6^  except  that  they  are  independent  and  identically 
distributed. 

If  y<X)  =  x,  then  the  inverse  relationship  is  y  =  (1  +  Xx)1^  if  \  ^  0  and 
y  =  exp(x)  if  \  =  0.  In  what  follows  we  assume  that  \  /  0.  If  X  =  0,  then  one 
simply  replaces  (1  +  Xx)^^  by  exp(x).  Using  12)  we  can  express  the  original 
response  y^  as  a  function  of  x^,  u^,  and  the  parameters  \,<x,Q: 

yi  =  {1  +  X  0)  +  X  uj  ei)1/x  . 

With  this  representation  we  can  study  the  untransformed  y^.  Let  F  be  the 
distribution  function  of  6^,  ...,  e^.  The  conditional  mean  of  y^  given  x^  is 

E(yi I— =  id  +  X  f '^(x^  £>  +  X  u“  €}1/X  dF(€)  .  (10) 

Let  cjp  be  the  pth  quantile  of  F,  i.e.,  F(q^)  =  p.  Then  the  conditional  pth  quantile 
of  y^  given  is 

qp(yi|Xi)  =  {1  +  X  f(X)(x.,  £)  +  X  uf  qp)l/X 
Duan  (1983)  has  proposed  the  "smearing  estimate"  of  K(y^|x^): 
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_N_  . 

E<yi  1— i.)  =  N_1  )  {1  +  X  f<X)  !V  +  X  u®  et)1/x 

t^i 


-where  6_  is  the  t-th  residual, 


et  =  {y£X>  -  f(X)  (Jt.,  0)}  /  u®  . 


(11) 


The  relationship  between  (10)  and  (11)  is  clear;  all  parameters  are  replaced  by 
their  estimates  and  averaging  with  respect  to  the  theoretical  distribution  of  is 
replacing  by  averaging  over  the  sample  6^,  ...»  £^..  Even  if  F  were  known,  (9)  could 
only  be  evaluated  by  numerical  integration,  but  (10)  does  not  require  integration. 
This  is  a  distinct  advantage  of  the  shearing  estimate. 

Let  Op  oe  the  p-tn  sample  quantile  of  (6^}.  Then  we  estimate  cjp ( )  by 


qo(yi I— i }  =  {l  + 


f(\)  "  oc  *  ,1/V 

f  ix.,  ti  +  \  u.  q  ! 

i  i  p 


In  the  case  of  the  median  (p  =  .5),  the  residuals  should  have  a  median  close  to  0 
and  we  can  replace  q  by  0.  The  estimated  conditional  median  is  then 

m(yi* 

=  u  +  X  0)}1/x  =  £<xire)  . 

As  mentioned  before,  in  figure  1  Elylxt)  and  m(y|x)  are  plotted  for  Population 
A  assuming  the  Ricker  model,  and  ffi(yjx)  is  plotted  for  the  Beverton-Holt  model  as 
well.  The  mean  return  is  always  considerably  larger  than  the  median  return.  This 
reflects  the  considerable  positive  skewness  seen  in  the  actual  recruitments  and 
evident  in  the  MLE  of  X,  \  *  .25. 

Because  recruitment  is  so  highly  variable  any  realistic  managsnent  model  will 
be  stochastic,  and  when  a  stochastic  model  is  constructed  it  is  vital  that  the 
entire  conditional  distribution  of  given  be  estimated.  This  can  be  done  by 
estimating  conditioned  quantiles  as  above.  Ruppert,  Reish,  Deriso,  and  Carroll 
(1985)  use  a  closely  related  method  for  estimating  conditional  quantiles  when 
constructing  a  stochastic  model  of  the  Atlantic  menhaden  population.  Transformation 
of  the  menhaden  stock-recruitment  data  is  discussed  further  in  Carroll  and  Ruppert 
(1984). 


Summary  and  Conclusions 

A  theoretical  model  relating  a  response  y  to  independent  variables  x  and 
parameters  0  may  not  be  suitable  in  its  original  form  for  least-squares  estimation. 

This  is  the  case  if  the  response  exhibits  skewness  or  nonconstant  variance. 
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However ,  if  the  response  and  the  model  function  are  transformed  in  the  same  way, 
then  the  transformed  response  may  be  approximately  normally  distributed  with  a 
nearly  constant  variance  and  then  least-squares  estimation  will  be  efficient. 

In  this  paper  we  propose  combining  the  Box-Ccx  power  transformation  with 
weighting  by  u*,  where  is  a  variable  not  depending  on  y  and  at  is  a  parameter  to 
be  estimated.  Another  possibility,  one  that  is  not  explored  here,  is  to  have  ufc  be 
the  predicted  value  fOo,  0).  Weighting  the  untransformed  response  by  |fOr,  9)1® 
is  studied  in  Bax  and  Hill  (1974),  Pritchard,  Downie,  and  Bacon  (1977)  and  Carroll 
and  Ruppert  (1982). 

The  Box-Cox  parameter  X  and  the  power  weighting  parameter  a  can  be  estimated 
simultaneously  with  9  and  o-  by  maximum  likelihood.  A  confidence  region  for  ( \,  a) 
car.  be  constructed  by  likelihcod-ratio  testing. 

Four  stock-recruitment  data  sets  were  analyzed  by  our  transformation 
itf th-odciory.  The  original  response,  return,  is  in  each  case  skewed  or 
heteroscedastic.  Only  for  the  Pacific  cod  stock  is  \=1  and  a=0,  corresponding  to 
no  transformation,  in  the  95%  confidence  region.  On  tne  other  hand  \=0  and  <x=0, 
which  is  the  linearizing  transformation  for  the  Ricker  model,  is  in  the  95% 
confidence  region  for  all  four  stocks.  Also,  the  inverse  transformation  (\  =  -1) 
which  linearizes  tne  Beverton-Holt  is  not  in  the  95%  region  for  any  of  the  four 
stocks.  If  botn  the  Ricker  and  Beverton-Holt  models  are  linearized,  then  the  Ricker 
model  ./ill  appear  better  fiLtmy,  not  because  it  is  necessarily  superior  but  because 
the  log  transformation  is  mure  suitable  them  the  inverse  transformation.  There  are 
examples  where  \=>-l  is  quite  suitable,  for  example  the  Atlantic  menhaden  stock 
(Carroll  and  Ruppert,  1984). 

After  \  and  a  have  been  estimated  by  maximum  likelihood,  the  fitted  model 
should  be  checked  by  residual  analysis  as  described  in,  for  example,  Draper  and 
Smith  (1981).  Maximum  likelihood  estimatiion  is  highly  sensitive  to  outlying 
observations.  Such  influential  points  rray  be  evident  from  the  residuals.  Robust 
estimators  for  our  transformation  model  is  an  important  area  for  future  research. 

At  present,  robust  estimation  has  been  studied  only  for  the  rather  different 
methodology  where  only  the  response,  not  the  model,  is  transformed;  see  Carroll  and 
Ruppert  (1985),  Carroll  (1980),  and  Bickel  and  Doksum  (1981). 

The  model  function  gives  the  conditional  median  of  the  response,  and  the 
conditional  mean  and  the  other  conditional  quantiles  of  the  response  can  be  easily 
estimated.  By  estimating  conditional  quantiles  one  in  effect  builds  a  model  of  the 
skewness  and  heteroscedasticity  in  the  untransformed  response.  Such  a  model  is  a 
crucial  part  of  a  realistic  stochastic  managanent  model  of  the  stock. 

We  have  used  four  fish  stocks  as  examples  of  our  proposed  statistical 
methodology,  but  our  analyses  should  not  be  considered  definitive.  For  example,  we 
did  not  consider  the  effects  on  the  Skenna  River  stock  of  the  1951-2  rock  slide, 
changes  in  exploitation  rates,  an  artifical  spawning  channel  opened  in  1965,  or 
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interactions  between  year  classes;  see  Ricker  and  Smith  (1975).  We  believe, 
however,  that  power  and  weighting  transformations  will  be  equally  as  useful  for  more 
elaborate  models  as  for  the  basic  ones  that  we  have  employed  for  illustrative 
purposes. 
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